Delta Lake | ifkarsyah

Projects

Lakehouse Platform

A self-service data lakehouse built on Databricks and Delta Lake, unifying batch and streaming workloads with a single storage layer.

Data Engineering SparkDatabricksDelta Lake

Blog Posts

Debezium Series, Part 5: Sink Connectors — Delta Lake & Iceberg

Landing CDC events into open table formats. Upsert and delete semantics with Delta Lake MERGE, Iceberg MERGE INTO, partition strategies, and JDBC sink for relational targets.

Streaming DebeziumKafka

Databricks Series, Part 6: ML Serving and Workflows

Batch and real-time model inference, Databricks Model Serving endpoints, and orchestrating the full ML pipeline with Databricks Workflows.

Data Engineering DatabricksDelta Lake

Databricks Series, Part 5: Machine Learning with MLflow

Tracking experiments, logging models and artifacts, comparing runs, and managing the model lifecycle with MLflow on Databricks.

Data Engineering DatabricksDelta Lake

Databricks Series, Part 4: Feature Engineering at Scale

Databricks Feature Store, FeatureEngineeringClient, FeatureLookup, training sets, and eliminating training-serving skew.

Data Engineering DatabricksDelta Lake

Databricks Series, Part 3: Data Ingestion with Auto Loader

cloudFiles format, schema inference, schema evolution, and building robust incremental ingestion pipelines on Databricks.

Data Engineering DatabricksDelta Lake

Databricks Series, Part 2: Lakehouse Architecture

Unity Catalog for governance and discovery, the medallion Bronze/Silver/Gold pattern, and Delta tables as the storage foundation.

Data Engineering DatabricksDelta Lake

Databricks Series, Part 1: Getting Started

Navigating the Databricks workspace, launching clusters, writing notebooks, and submitting your first PySpark job.

Data Engineering DatabricksDelta Lake

Databricks Series, Part 0: Overview

The lakehouse platform concept, what Databricks adds on top of Spark and Delta Lake, and how it compares to alternatives.

Data Engineering DatabricksDelta Lake

Delta Lake Series, Part 6: Streaming & CDC

Writing to Delta with Structured Streaming, exactly-once guarantees, reading Delta as a stream, and Change Data Feed for downstream propagation.

Data Lake Delta LakeSpark

Delta Lake Series, Part 5: Performance Optimization

Making Delta Lake queries fast — OPTIMIZE, Z-ordering, data skipping with column statistics, compaction, and partitioning strategies.

Data Lake Delta LakeSpark

Delta Lake Series, Part 4: Time Travel & Versioning

Querying historical snapshots by version or timestamp, rolling back bad writes, auditing the table history, and managing retention with VACUUM.

Data Lake Delta LakeSpark

Delta Lake Series, Part 3: Schema Enforcement & Evolution

How Delta Lake validates schemas on write, rejects incompatible data, and handles controlled schema changes over time.

Data Lake Delta LakeSpark

Delta Lake Series, Part 2: Transaction Log & ACID

How the Delta Lake transaction log enables atomicity, serializable isolation, optimistic concurrency, and conflict resolution.

Data Lake Delta LakeSpark

Delta Lake Series, Part 1: Getting Started

Creating Delta tables, reading and writing with Spark, Delta SQL, and what the _delta_log looks like in practice.

Data Lake Delta LakeSpark

Delta Lake Series, Part 0: Overview

The data lake reliability problem, what Delta Lake adds on top of Parquet, and how it compares to Apache Iceberg and Apache Hudi.

Data Lake Delta LakeSpark